P. Alexander Burnham
The following is a report for my analysis of the 2017 USDA NASS dataset and USDA ARMS survey. The goal is to explore the distribution of new and beginning farmers and their income and wealth statistics.
Use the 2017 USDA NASS dataset to examine new and beginning farmer (less than 11 years of experience) principal operators, as well as other relevant NASS data to answer ONE of the following questions: In which states are the greatest proportion of farmers considered new and beginning?
Load In Data from NASS Dataset
Here I selected the number of principle producers from each state with
less than and greater than or equal to 11 years of experience. The sum
of the two being the total number of principle producers in each state.
I am using an an interface in R to access the NASS API using an API key
provided by usda.gov. This ensures the code operates without a separate
data repository and ensures the data are up to date when accessed. As
the files are 5gb this reduces my overhead and simplifies
reproducibility for other potential collaborators.
# get principle producers with less than 11 years experience -state level
greatThan_11 <- nass_data(year = 2017,
short_desc = "PRODUCERS, PRINCIPAL, YEARS ON ANY OPERATION, LT 11 YEARS - NUMBER OF PRODUCERS",
agg_level_desc = "STATE")
# get principle producers with greater than or equal to 11 years experience -state level
lessThan_11 <- nass_data(year = 2017,
short_desc = "PRODUCERS, PRINCIPAL, YEARS ON ANY OPERATION, GE 11 YEARS - NUMBER OF PRODUCERS",
agg_level_desc = "STATE")
Merge datasets, select required columns
Here I do the merge operation to combine the data and simplify the
dataset for my purposes.
# change value to greater11
greatThan_11 <- greatThan_11 %>%
rename("greater11" = "Value")
# change value to less11
lessThan_11 <- lessThan_11 %>%
rename("less11" = "Value")
# merge datasets by state alpha
experienceMerged <- merge(greatThan_11, lessThan_11, by = "state_alpha")
# select state alpha, greater11, and less11
experienceClean <- dplyr::select(experienceMerged, less11, greater11, state_alpha, state_fips_code.x)
Create percentage variable
Here I ensure the variables of interest are numeric and create the
percentage variable required.
# remove commas and convert to numeric vectors
experienceClean$less11 <- as.numeric(gsub(",","",experienceClean$less11))
experienceClean$greater11 <- as.numeric(gsub(",","",experienceClean$greater11))
experienceClean$fips <- as.numeric(experienceClean$state_fips_code.x)
# calculate the percentage here
experienceClean$percentNew <- (experienceClean$less11/(experienceClean$less11 + experienceClean$greater11)) * 100
Let’s make a choropleth to explore the spatial distribution
of this variable.
Here I implement a plotly choropleth. It has some interactivity built
into it. The ability to hover over states to see the state code and
value is included as well as basic zooming and saving features. I find
it to be a good exploratory plotting tool.
Let’s plot a ranked bar plot with an average threshold
line
While the spatial plot is very helpful for looking for potential
geographic patterns, it makes looking for max and min values more
difficult at a glance. Here I calculated the national proportion of new
primary producers to to all producers. I use a proportion test to
calculate the confidence interval and find three sigma to find the
interval between which 99.7 of the data will be found in a normal
distribution.
# find average percentage nation wide
new <- sum(experienceClean$less11)
total <- (sum(experienceClean$less11) + sum(experienceClean$greater11))
# find average
avg <- (new / total)*100
# find confidence interval to calculate 3 sigma
prop <- prop.test(new, total, correct=FALSE)
threeSig <- (((prop$conf.int[2] - prop$conf.int[1])/4)*3)*100
# create a variable for our three sigma calculation
experienceClean$threeSigCat <- ifelse(experienceClean$percentNew>avg+threeSig, "Above", ifelse(experienceClean$percentNew<avg-threeSig, "Below", "Average"))
Is there a difference between the state’s
proportions?
With a significant p-value (p < 0.0001) I reject the null hypothesis
and there appears to be evidence that the states differ in the
proportion of new and beginning farmers.
##
## 50-sample test for equality of proportions without continuity
## correction
##
## data: experienceClean$less11 out of (experienceClean$less11 + experienceClean$greater11)
## X-squared = 15825, df = 49, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## prop 1 prop 2 prop 3 prop 4 prop 5 prop 6 prop 7 prop 8
## 0.5676856 0.7163749 0.7286361 0.7947788 0.7414086 0.7149393 0.7191781 0.8161249
## prop 9 prop 10 prop 11 prop 12 prop 13 prop 14 prop 15 prop 16
## 0.7036227 0.6845286 0.7041925 0.8036842 0.7232812 0.7891885 0.7767058 0.7793540
## prop 17 prop 18 prop 19 prop 20 prop 21 prop 22 prop 23 prop 24
## 0.7352635 0.7201429 0.7502462 0.7596422 0.6946287 0.7686988 0.8108141 0.7619867
## prop 25 prop 26 prop 27 prop 28 prop 29 prop 30 prop 31 prop 32
## 0.7387001 0.7892362 0.7479764 0.8123275 0.8006048 0.7174619 0.7757881 0.7635868
## prop 33 prop 34 prop 35 prop 36 prop 37 prop 38 prop 39 prop 40
## 0.7381697 0.7603600 0.7631999 0.7207460 0.7385124 0.7761268 0.7322558 0.7310627
## prop 41 prop 42 prop 43 prop 44 prop 45 prop 46 prop 47 prop 48
## 0.8132155 0.7493078 0.7272972 0.7345957 0.7548948 0.7194515 0.7522374 0.8048102
## prop 49 prop 50
## 0.7108951 0.7363196
The top five states with the highest percentage of new farmers in descending rank order were Delaware, South Dakota, North Dakota, Minnesota, and Wisconsin. The national percentage was 75.37% new and beginning farmers. In general, it seemed as though the highest influx of new farmers has been in the northern Midwest. As one of the largest farming centers in the country, this makes some sense. All states where either significantly higher or lower (3-sigma = 0.077%) than the national estimate. 21 states were below the 3 sigma margin and the remaining 29 were above. This may indicate that farming, varies culturally, environmentally, legislatively etc. so much from state to state that the national average is actually a poor population parameter estimate to compare to individual states. A proportion test indicated that there was a highly significant difference between the proportions of new and beginning farmers among states (p < 0.0001) In future and with more time, more complicated statistical models should be developed with other meaningful social, political and geographic parameters at the state level in order to examine further patterns in this variable.
Using data from the USDA ARMS survey on farm income and wealth statistics, assess the relationship between new and beginning farmers and federal government direct farm payments. To what extent is there a relationship between new and beginning farmer populations and federal government payments? Where are these payments most common?
Get Data from the ARMS API
Here I use a REST POST request to grab the data from the USDA ARMS API
as a json file. I selected the three most recent years where direct
payments were still being disbursed (ended in 2014). These are not
perfectly in line with our 2017 new and beginning data but taking time
lags and the speed of farmer turnover into account, some interesting
patterns may still be observable. I requested state-level direct payment
data and included economic class and production specialty as
covariates.
# my API endpoint
https://api.ers.usda.gov/data/arms/surveydata?api_key=My_Key
# my POST request:
{
"year": [2011,2012,2013],
"state": ["Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "District of Columbia", "Florida", "Georgia", "Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Minor Outlying Islands", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Northern Mariana Islands", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "U.S. Virgin Islands", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"],
"report": "government payments",
"category": ["economic class", "production specialty"],
"variable": "direct payments"
}
Read in Data and Select Relevant Columns
Here I read in the data requested and select the columns for the
analysis. The estimate variable is in 1000s of dollars so I create a new
column taking this scaling into account.
# Loading packages
library(jsonlite)
# read in data as a data frame
arms <- jsonlite::fromJSON("ARMS_direct_payment.json")$data
# select data
armsClean <- dplyr::select(arms, year, state, category, category_value, estimate)
# estimate in 1000s of dollars
armsClean$payments <- armsClean$estimate * 1000
# Split data by category
split <- split(armsClean, armsClean$category)
specialty <- dplyr::select(split$`Production Specialty`, year, state, payments, category_value)
class <- dplyr::select(split$`Economic Class`, year, state, payments, category_value)
Lets plot payments by year for both categories
I summarize both datasets by year, state, and category_value using dplyr
group_by.
## # A tibble: 6 × 6
## # Groups: year [2]
## year category_value mean sd n se
## <int> <chr> <dbl> <dbl> <int> <dbl>
## 1 2011 $1,000,000 or more 50601333. 34663388. 15 8950048.
## 2 2011 $100,000 to $249,999 18243400 16331102. 15 4216672.
## 3 2011 $250,000 to $499,999 29732600 23543001. 15 6078777.
## 4 2011 $500,000 to $999,999 44638800 33774248. 15 8720473.
## 5 2011 Less than $100,000 14070600 11217099. 15 2896243.
## 6 2012 $1,000,000 or more 70089667. 44077898. 15 11380864.
Let’s run a few quick statistical models to examine the patterns observed in those two figures above. The distribution is heavily skewed right for both datasets. I will model the distributions using a gamma distribution to account for this.
# first with economic class
modClass <- glm(data = class, payments~year * category_value, family = "Gamma")
Anova(modClass, test.statistic="LR") # likelihood ratio test
## Analysis of Deviance Table (Type II tests)
##
## Response: payments
## LR Chisq Df Pr(>Chisq)
## year 0.081 1 0.7754
## category_value 102.114 4 <2e-16 ***
## year:category_value 3.875 4 0.4232
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# remove 0s for gamma dist.
specNo0 <- specialty[specialty$payments>0,]
# next producer specialty
modProd <- glm(data = specNo0, payments~year * category_value, family = "Gamma")
Anova(modProd, test.statistic="LR") # likelihood ratio test
## Analysis of Deviance Table (Type II tests)
##
## Response: payments
## LR Chisq Df Pr(>Chisq)
## year 0.10 1 0.754
## category_value 367.32 11 <2e-16 ***
## year:category_value 5.31 11 0.915
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Merge the datasets from questions 1 and 2
I will examine a state’s proportion of new farmers to the amount of
money in direct payments received in 2013 (the most recent year it was
issued). I will use the economic class dataset in this merge with
economic class as a potential covariate of interest.
# add state name to experience dataset
experienceClean$state <- state.name[match(experienceClean$state_alpha,state.abb)]
# select experience vars for merge
expSimple <- dplyr::select(experienceClean, percentNew, state)
# do the merge for all x
expClass <- merge(x=class, y=expSimple, all.x=T, all.y = F, by = "state")
Let’s Examine the Significance of this Apparent Trend
# Experience model
modExp <- glm(data = expClass[expClass$year==2013,], payments~percentNew * category_value, family = "Gamma")
Anova(modExp, test.statistic="LR") # likelihood ratio test
## Analysis of Deviance Table (Type II tests)
##
## Response: payments
## LR Chisq Df Pr(>Chisq)
## percentNew 21.734 1 3.133e-06 ***
## category_value 46.025 4 2.434e-09 ***
## percentNew:category_value 23.198 4 0.0001156 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s pull the count data for direct payments at the state
level
Which states had the highest number of payments in 2013?
## # A tibble: 15 × 2
## state sum
## <chr> <dbl>
## 1 Iowa 44279
## 2 Illinois 34246
## 3 Minnesota 32844
## 4 Wisconsin 23590
## 5 Nebraska 23149
## 6 Kansas 19587
## 7 Indiana 19026
## 8 Missouri 18456
## 9 Texas 12449
## 10 Georgia 5387
## 11 North Carolina 5004
## 12 Arkansas 4161
## 13 Washington 2435
## 14 California 2369
## 15 Florida 562
Only 15 states in the last 3 years of the program received funding. Arkansas, California, Florida, Georgia, Illinois, Indiana, Iowa, Kansas, Minnesota, Missouri, Nebraska, North Carolina, Texas, Washington, and Wisconsin. There were significant effects of economic class and production specialty on government direct payments (p < 0.0001), however, for the years I selected (2011-2013) there was no effect of year detected. The interaction effects of year on economic class and production specialty were also insignificant with this data selection. On average, farms with a higher economic class value (potentially larger commercial farms) and corn producers received the highest payments overall. To determine if these were significant, a corrected pairwise comparisons post-hoc test should be conducted. On average, states with higher percent new and beginning farmers in 2017 received higher direct payments in 2013 (x^2 = 21.7, p < 0.0001). There was also a significant difference in average payments between economic classes (x^2 = 46.0, p < 0.0001). A significant interaction between economic class and percent new primary producers indicates that the positive linear trends between these variables differ based on the economic class they are associated with (x^2 = 23.2, p = 0.0001). The states with the highest frequency of direct payments were Iowa, Illinois, Minnesota, Wisconsin, and Nebraska in decreasing rank order. The highest dollar amounts in terms of direct payments were also centered around Midwestern states like Illinois, Nebraska, Iowa. These were correlated with the highest corn producing states and the areas of highest percent new farmers as indicated by our choropleth. These significant effects of percent new and beginning farmers on direct payouts are likely correlated spatially with significant areas of farming. Further work would need to be done to attempt to parse the true nature of this relationship.